An Empirical and Formal Analysis of Decision Trees for Ranking
نویسندگان
چکیده
Decision trees are known to be good classifiers but less good rankers. A few methods have been proposed to improve their performance in terms of AUC, along with first empirical evidence showing their effectiveness. The goal of this paper is twofold. First, by replicating and extending previous empirical studies, we not only improve the understanding of earlier results but also correct implicit assumptions and conjectures. We focus on the dependence of AUC on the pruning level, the effect of Laplace correcting the class frequencies in leaf nodes, and the influence of the number of distinct scores produced by a decision tree. Second, we complement the empirical studies by a formal analysis that offers further insights and explanations. Essentially, we show that the AUC is likely to increase with the number of scores produced by a tree, at least when these scores are reasonable (better than random) approximations of the true conditional probabilities in the leaves. Simulation studies with synthetic data verify our formal analysis and show its robustness toward estimation errors. As a byproduct, our result suggests a simple method for improving the ranking performance of decision trees, and it helps to understand why some classifiers are better rankers than others.
منابع مشابه
A New Balancing and Ranking Method based on Hesitant Fuzzy Sets for Solving Decision-making Problems under Uncertainty
The purpose of this paper is to extend a new balancing and ranking method to handle uncertainty for a multiple attribute analysis under a hesitant fuzzy environment. The presented hesitant fuzzy balancing and ranking (HF-BR) method does not require attributes’ weights through the process of multiple attribute decision making (MADM) under hesitant conditions. For the rating of possible alternati...
متن کاملLearning to Rank Cases with Classification Rules
An advantage of rule induction over other machine learning algorithms is the comprehensibility of the models, a requirement for many data mining applications. However, many real life machine learning applications involve the ranking of cases and classification rules are not a good representation for this. There have been numerous studies to incorporate ranking capability into decision trees, bu...
متن کاملRanking Cases with Classification Rules
Many real-world machine learning applications require a ranking of cases, in addition to their classi cation. While classi cation rules are not a good representation for ranking, the human comprehensibility aspect of rules makes them an attractive option for many ranking problems where such model transparency is desired. There have been numerous studies on ranking with decision trees, but not m...
متن کاملA Proposed Combination Method for Ranking Options in Multi-Criteria Decision Making by Data Envelopment Analysis and Common Set of Weights
The purpose of this paper is to fully ranking decision-making units using a combination of multi-criteria decision-making techniques and data envelopment analysis. Due to this fact that weights play an important role in ranking the options by multi-criteria decision-making methods and most of these methods have weakness in using weighting methods, therefore the ability for data envelopment anal...
متن کاملPredicting The Type of Malaria Using Classification and Regression Decision Trees
Predicting The Type of Malaria Using Classification and Regression Decision Trees Maryam Ashoori1 *, Fatemeh Hamzavi2 1School of Technical and Engineering, Higher Educational Complex of Saravan, Saravan, Iran 2School of Agriculture, Higher Educational Complex of Saravan, Saravan, Iran Abstract Background: Malaria is an infectious disease infecting 200 - 300 million people annually. Environme...
متن کامل